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Abstract An important feature of a dynamic game is its monitoring structure 
namely, what the players effectively see from the played actions. We consider 
games with arbitrary monitoring structures. One of the purposes of this paper 
is to know to what extent an encoder, who perfectly observes the played actions 
and sends a complementary public signal to the players, can establish perfect 
monitoring for all the players. To reach this goal, the main technical problem 
to be solved at the encoder is to design a source encoder which compresses the 
action profile in the most concise manner possible. A special feature of this 
encoder is that the multi-dimensional signal (namely, the action profiles) to be 
encoded is assumed to comprise a component whose probability distribution is 
not known to the encoder and the decoder has a side information (the private 
signals received by the players when the encoder is off). This new framework 
appears to be both of game-theoretical and information-theoretical interest. In 
particular, it is useful for designing certain types of encoders that are resilient 
to single deviations and provide an equilibrium utility region in the proposed 
setting; it provides a new type of constraints to compress an information source 
(i.e., a random variable). Regarding the first aspect, we apply the derived result 
to the repeated prisoner's dilemma. 
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1 Introduction 

The set of equilibrium utilities of a non-cooperative dynamic game is 
strongly related to the observation capabilities of the players. For instance, 
in a long-run repeated prisoner's dilemma where the two players do not 
see anything from the played actions (blind players), the only equilib- 
rium point corresponds to the inefficient outcome where both players defect 
[Aumann(1981a)][Sorin(1992)]. On the other hand, when players perfectly ob- 
serve all the actions which have been played (perfect monitoring assumption) , 
efficient equilibria can be sustained; in particular, the social optimum is an 
equilibrium point of the infinitely repeated dilemma or its version with dis- 
count factor. This special case illustrates the potential need for being able to 
transform the monitoring structure of a repeated game into a new one. The 
relevance of such a transformation may appear in other types of settings such 
as stochastic games, multi-agent learning, or networked optimization. For ex- 
ample, perfect monitoring (PM) can be targeted to implement the standard 
fictitious play or best- response algorithms (see e.g., [Peyton(2004)]). The de- 
sired final monitoring structure (i.e., after transformation) does not necessarily 
need to be PM and, for example, ensuring that the players observe (thanks to 
the transformation) a certain public signal can be sufficient to obtain efficient 
outcomes for the game. The solution proposed in this paper is to implement 
this monitoring structure transformation by adding an external agent or en- 
coder (whose role is not strategic but only to encode signals and send them 
to the players to improve their observation capability) to the initial game. 
For the sake of clarity and simplicity, the encoder is assumed to perfectly ob- 
serve the actions played and the desired structure, after transformation, is PM. 
Note that PM at the encoder is not always necessary to ensure PM for the 
players (see [LeTreust and Lasaulce(2011a)]). Interestingly, there exist some 
practical scenarios where assuming PM at a terminal is relevant. In wireless 
communications, the decentralized multiple access channel case is known to 
be very important [LeTreust and Lasaulce(2010)]. In this scenario, there are 
one receiver (e.g., a WiFi access point or a base station) and several transmit- 
ters (e.g., mobile terminals) which choose freely their transmission policy (say 
their power allocation policy) in order to optimize some performance metric 
such as the individual transmission rate. Considering that the base station 
has a computational and observation capability much larger than the mo- 
bile transmitters is a typical assumption in wireless communications (see e.g., 
[Kowalewski(2000)][DaSilva et al(2011)DaSilva, Taffin, Lasaulce, and Buljore]) 
As a consequence, the receiver can, in particular, have the role of an encoder 
which sends a feedback on the played actions to the transmitters. Another 
important scenario of practical interest for which the framework proposed in 
this paper is fully relevant is the case of sensor networks with a fusion center 
(see e.g., [I. F. Akyildiz and Cayirci(2002)]). 

One of the main issues addressed in this paper is the design of an encoder 
which is capable of transforming a monitoring structure by sending comple- 
mentary public signals to the players. The problem comes from the fact that 
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the set of public signals has a fixed cardinality. One of the consequences of 
this assumption is the existence of an information constraint on the played 
action profiles and more precisely on their distribution, and therefore on the 
feasible players' utilities. As explained further, characterizing this information 
constraint amounts to designing an encoder which represents the information 
source (namely, the action profile) in a manner as concise as possible. How- 
ever, to make the source encoder able to operate at equilibrium (and therefore 
characterize equilibrium utilities), the encoder has to possess a certain prop- 
erty, called the resilience property [LeTreust and Lasaulce(2011b)], which has 
a cost in terms of compression efficiency. In terms of communication, such a 
property ensures that, even when one player uses a distribution on his action 
sequences which is arbitrary and unknown to the encoder, PM remains guar- 
anteed. In strategic terms, if we consider the case of repeated games (which is 
the case study chosen in this paper), it means that grim-trigger-like plans can 
be implemented. 

The paper is structured as follows. A state of the art on the problem under 
investigation is done in Sec. 2. Sec. 3 exploits information-theoretic tools to 
derive one of the central two results of this paper which is the information con- 
straint (9) stated in Theorem 1 and explains how this constraint translates into 
a set of action profile distributions (and therefore into feasible utilities) that 
are compatible with the perfect monitoring assumption. Sec. 4 provides the 
second important result, stated in Theorem 2, which is an achievable equilib- 
rium utility region for encoder-assisted infinitely repeated games with signals. 
The paper is concluded in Sec. 5. 

2 Related works 

Before mentioning some relevant works related to the one reported here, it is 
useful to define at this point a monitoring structure. A monitoring structure 
is a conditional or transition probability defined by : 

1:A^A{S) (1) 

where ^ = x ^2 x •■• x is the discrete set of action profiles, K is 
the number of players, Ak is the discrete set of actions of player fc e /C = 
{1,2,..., K}, iS = 5i X X ... X Sk, Sk is the discrete set of signals received by 
player k, and the notation A{S) stands for the set of probability distributions 
on iS (unit simplex). 

The first relevant body of related works comprises papers providing lossless 
[Shannon(1948)] and zero-errors [Shannon(1956)][Witsenhausen(1976)] source 
coding theorems. Indeed, the role of the encoder in this paper is to encode a 
sequence (or block) of action profiles into a sequence of public signals which is 
observed by the players. As already mentioned, making this in a concise man- 
ner is of prime interest to characterize the information constraint. The con- 
sidered source coding problem has two main features : the decoders (namely, 
the players) have a side information on the source (the private signal) and we 
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want the encoder to be resilient to single deviations that is, the past action 
profiles are decoded reliably even when the probability distribution of the ac- 
tion of a given player varies arbitrarily over time and remains unknown to 
both the encoder and decoders. Remarkably, the information theory litera- 
ture provides the right framework to design such encoders. The corresponding 
framework is the one of arbitrary varying sources (AVS) : the source distribu- 
tion Vv{o) G ^(^) can vary from sample to sample, depending on a parameter 
or state f S V which represents, in our setting, the probability distribution 
of the deviator's action. The most relevant works on AVS is based on graph 
coloring [Bondy and Murty(1976)] and can be found in [Ahlswede(1979)] and 
[Ahlswede(1980)]. Indeed, the latter references deal with the scenario of two 
correlated sources either in the case where the destination is informed with 
the sequence of states or in the case where it is not known. The work re- 
ported in this paper is precisely related to the scenario of two arbitrary vary- 
ing correlated sources of actions a and private signals Sk with a destination 
(i.e. player k € IC) uninformed of the state (i.e. strategy of an eventual de- 
viator) ; this scenario is described by Fig. 3.1. One of our contributions, in 
addition to establishing a link between equilibrium utility regions and the 
AVS literature, is to show that the entropy positiveness condition (EPC) in 
[Ahlswede(1979)][Ahlswede(1980)], under which source coding rates (i.e. op- 
timal compression level) can be characterized, can be removed and replaced 
with another mathematical condition which is of strong game-theoretic in- 
terest namely, the resilience property. Additionally, it holds for some useful 
special cases for which the EPC is not met, the case of deterministic channels 
in particular. 

The second body of works concerns works on folk theorems. The stronger 
results have been obtained for one of the simplest classes of dynamic 
games namely, the one of repeated games (see e.g., [Sorin(1992)] for a 
survey). The standard approach consists in assuming a given monitoring 
structure (e.g., standard-trivial monitoring [Lehrer(1991)], public monitor- 
ing [Fudenberg et al(1994)Fudenberg, Levine, and Maskin], or almost-perfect 
monitoring [Horner and Olszewski(2006)]) and, then, deriving a folk theorem. 
Compared to these works, our approach is different since we do not try to char- 
acterize the equilibrium utilities of a repeated game with an arbitrary monitor- 
ing structure (which is an open problem [Renault and Tomala(2011)[). Rather, 
our approach aims at transforming, with an additional encoder, an arbitrary 
monitoring structure of any dynamic game into a new monitoring structure 
for which the equilibrium utilities can be fully characterized ; in this paper, 
PM is the targeted final structure. Even though the final monitoring structure 
is PM, there are still some differences between a dynamic game with PM (the 
focus will be on repeated games here) and a dynamic game where players have 
PM thanks to the encoder : 

— there exists an internal information constraint on the action distribution 
due to the fact that the set of public signals has a fixed cardinality ; 
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— action profiles are encoded by blocks by the encoder and each player de- 
codes a block of played actions from a whole block of observations. There- 
fore PM is established with a delay ; 

— only i.i.d equilibrium utilities (and convex combinations of them) are stud- 
ied. This assumption on the action profiles is well motivated in the paper 
and does not prevent us from deriving useful results which may be extended 
if needed. 

For all of these reasons, we will use the term "virtually perfect monitoring" 
(VPM) to refer to such a framework. 

To conclude on the most relevant references related to the work re- 
ported in this paper, we will mention a couple of references at the intersec- 
tion between game and information theory. For instance, in [Lehrer(1988)], 
[Bavly and Neynian(2003)], [Peretz(2011)] entropy-based information con- 
straints are used to characterize the individually rational levels of repeated 
games with bounded recall. In [Gossner and Tomala(2007)], the authors char- 
acterize the maximum utility a team can guarantee against another in a 
class of repeated games with imperfect monitoring by exploiting a constraint 
on possible correlation schemes expressed in terms of entropy variation. In 
[Gossner et al(2006)Gossner, Hernandez, and Neyman], the authors are ex- 
ploiting an information constraint in the sense of the present work that is, 
the source coding rate has to be less than the channel capacity, although the 
constraint is not interpreted this way in their work. This leads to a charac- 
terization of equilibrium utilities a team of two players can implement when 
only one player is (noncausally) informed of the i.i.d. sequence of states of the 
repeated game. 

3 Virtual perfect monitoring of an arbitrarily varying information 
source 

3.1 Methodology 

The scenario under consideration is as follows (see Figure 3.1). Let us fix a 
family of probability distributions V^. E ^{Ak) with k d K.. When a given 
action profile a = (ai, 02, «/<-) G is drawn from the product probability 
V* = V* eg) ... (8) T'^ G ^(^), player k G K, receives a symbol Sk G Sk with a 
probability given by the conditional probability 



An encoder C, who perfectly monitors the played actions, encodes the observed 
action profiles by blocks or sequences of size n > 1 into a sequence of public 
signals sq G Sq which are received by all the players. These public signals form 
a perfect channel of capacity log2 |iSo|, which is orthogonal to the one defined 
by "1 that is, player k actually receives a pair of signals (s^, sq) for every played 





(2) 
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action profile. Note that player k recall it's own action Ofc. The purpose of the 
encoder is to use the minimal amount of additional information, in order for 
every player to acquire the information which is missing to have PM. In what 
follows, we first define a code in our setup. Second, we define the notion of 
virtually perfect monitoring (VPM) of actions profile a = (ai, 02, ax) G -4 
defined as an arbitrarily varying information source (AVS). Third, we prove 
a theorem which state an information constraint on the action profile distri- 
bution which is due to the fact that the communication channel between the 
encoder and players has a limited capacity. Denote A" (resp. A°°) the set of 
sequences a" G A"' of length n G N (resp. of sequences a°° G A°° of infinite 
length). 




Fig. 1 Each action profile of the game a = (ai , a2, . . . , a/f ) generates a signal profile 
(si,S2, ■■■sk) through a condition probability ~l. Player #fc (represented twice here above) 
only observes Sf^ from this action profile a. The encoder C, who perfectly monitors the played 
action profiles a, builds a complementary public signal sq which is observed by all the players. 
Each player has to reconstruct virtual perfect monitoring (VPM) from a sequence of pairs 
of signals (sfe,so) and the knowledge of the sequence of its individual actions a^. 



3.2 Information constraint for resilient coding with side information at the 
decoder 

Here, we assume that the distribution of the source may vary from stage (or 
action profile) to stage (or action profile) ; this is the framework of arbitrarily 
varying source (AVS) coding. 
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Definition 1 (Arbitrarily Varying Source (AVS)) Let V* £ ^(^) a 
probability distribution (mixed strategy) and V tlie set of states of tire source: 

V^yJkeKA{At). (3) 

The arbitrarily varying (AVS) information source a £ ^ is at a certain state 
n G V, when one component of the action profile has a distribution which may 
vary arbitrarily over time and is fully unknown to the coder. For example, 
when the sequence of the states \s v = Qi € A{A°°) C V, the sequence of 
actions a" — (a", . . . , a^) is drawn following a probability distribution given 
by: 



Vvi^i, • • ■ , a", . . . , a^) 



K 



(a"). (4) 



Now, we formally define the notion of code for the AVS represented by Fig. 
3.1. 

Definition 2 A code A of size n for the encoder C and decoders K, consists of 
an encoding function /o and K decoding functions {gk)k£K defined as : 

/o : > Sq , . 

Qk : SJ^ X S'^ X Al A'\ Vfc e /C ■ ^ ' 

Denote by A{n), the set of codes for which the length n G N of the code- words 
is fixed. 

VeiX) = max max 7'.„.(a" ^ g,(s;\ s^", a',')), (6) 

The error probability Vei}^) of the code A G A[ri) is defined by equation (6) 
and corresponds to the sum of the error probability for each decoder fc G /C, 
considering every possible deviation Vi G A{Af) of player i e /C (i.e. any 
variation of the source). 

Definition 3 (Virtually Perfect Monitoring (VPM)) Players /C have a 
virtually perfect monitoring (VPM) of the information source a e if for all 
e > 0, there exists a parameter n G N, and a code A G A{ji) such that: 

Ve{\) < e, (7) 

The condition (7) means that it is possible to find coding and decoding 
functions to represent any sequence of n realizations of the if —dimensional 
random variable a with 2"'°S2l'So| indices or sequences of public signals in 
such a way that, any decoder k, based on the knowledge of (sq , s)!, aJJ), can 
find the sequence a" with an arbitrarily small probability of error. In a game 
theoretical framework, the players virtually perfect monitor the sequences of 
past actions played. 

At this point, the main issue is to be able to characterize the set of AVS 
information source that are compatible with the VPM of the players K.. The 
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AVS hypothesis guarantee that the past actions played wih be observed by 
all the players even if one of them deviates, manipulates the coding scheme 
A e A(n) in order to break reliability. Theorem 1 provides an information 
constraint which guarantee VPM for the AVS information source of player's 
actions a. To state this theorem, an auxiliary graph needs to be defined first. 

Definition 4 (Auxiliary graph) For each player i e /C, an auxiliary graph 
Gi is defined as follows Qi = {Ai,£i). The actions e Ai of player i e /C are 
the vertices of the graph. There exists an edge = {ai,a'^) e £i between two 
actions ai G Ai and a'^ £ Ai if : 

3 a_j G Supp Vl,, 3k e /C, 3sk G Sk, 3S > 0, s.t. 
min("l(sfc|aj,a_i),1(sfc|ai,a_j)) > S 

where Supp PI,- is the support of the probability distribution V^i defined by 

vu^0,^^r*^Uj^,AiA,). 

Two vertices a,; £ Ai and a'^ G Ai are neighbors in the graph Qi if the prob- 
ability that these actions lead, through ~1, to the same signal Sk G Sk for 
at least one player k € K. is not zero. Now, to define the chromatic number 
[Bondy and Murty(1976)] of the graph Qi, we define the notion of coloring in 
our context. 

Definition 5 (Coloring) Let iPi a set of colors. A coloring of the graph Qi 
is a function (j^i : Ai — >■ $i which satisfies : 

Vej = (ai,a'j) G £i, we have that 4>i{ai) ^ (f)i{ai). (8) 

A minimal coloring of the graph Qi is a coloring (pi for which the cardinality 
of the set of colors is minimal. The chromatic number Xi of the graph Qi 
is the cardinality \<Pi\ of the set of colors of the minimal coloring of the graph 
Qi. This is precisely this quantity which is used in the next theorem. 

Theorem 1 (Coding result for AVS) Players /C have a virtually perfect 
monitoring (VPM) of the arbitrarily varying (AVS) information source aE A 
if the following condition is met : 



/?* = max 



max H{a^i,k\sk{ai),ak) + \ogr^Xi 



<log2|5o|, (9) 



where : 

• a^i^k is the action profile without the components i and k. It is distributed 

• Sfe(ai) is the signal received by player k when the action Oi is fixed. It is 
induced by a^i and the transition probability ~\ : 

la. : A-, A{Sk) (10) 

a-'i. — > ~\aiisk\a-i) = ~\{sk\ai,a^i) 

= ^ ~I(sfe,s_fe|ai,a_i) ; (11) 
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• log2 |iSo| is given by the cardinality of the set of public signals and corre- 
sponds to the capacity of the perfect channel between the encoder C and the 
players K,. 

Several comments are in order. First, let us comment on the main assump- 
tions. The i.i.d assumption over time made on the source to be encoded is 
common in the information theory literature and will only be briefly com- 
mented. Solving the i.i.d. case might not only be helpful but even sufficient for 
solving the case with arbitrary correlation between consecutive source samples. 
To be more specific, if the source generates B blocks of £ correlated symbols 
for B sufficiently large, i < +00, and i.i.d. blocks, then the information con- 
straint directly follows from the original i.i.d case (concerning i.i.d symbols) 
by considering vectors of symbols instead of symbols. Beyond this framework, 
the source coding literature comprises works dealing with refinements such 
as universal coding [Gallager(1976)] and information-spectrum based coding 
[Han(2003)]. Now, from a game-theoretic perspective, studying sequences of 
i.i.d profiles (up to one component) is not only an intermediate case which can 
be challenging technically (think of repeated games with arbitrary monitor- 
ing structures) but also to design implementable equilibrium action plans. As 
for relaxing the i.i.d assumption over space (over the components), provided 
the resilience property is relaxed and the joint distribution on the actions is 
known to the encoder, it only consists in changing scalar quantities into vec- 
tors (of size K). When resilience to single deviations is required, the spatial 
i.i.d assumption is useful to derive information constraint (as advocated by 
the proof provided in App. A) but studying necessity is a possible extension 
of this work. At last note that the spatial i.i.d assumption allows one to study 
mixed strategies which is known to be important. 

Now, let us comment on the result i.e., the information constraint defined 
by (9). The presence of the maximum over i is due to the fact that the loca- 
tion of the component (which corresponds to the deviator in a game), whose 
distribution is unknown, is itself unknown to C. The second maximum over k 
and a.i indicates the case where the deviator i chooses the worst action Oi in 
terms of coding efficiency for to the worst decoder k. The conditioning w.r.t. 
(sfe(ai),afe) in the entropy translates the knowledge of the decoder in terms 
of side information, which therefore reduces the entropy. The isolated term 
log2 Xi corresponds to the amount of information needed by C to encode a 
component separately ; since the probability distribution of is unknown, 
symbol-by-symbol coding is optimal here. Without side information at the 
decoder i this quantity would be log2 \Ai\. At last, the righthandside term 
log2 |iSo| corresponds to the channel capacity of a broadcast channel with a 
public message and for which the decoders directly observe the signal sent by 
the encoder (see e.g., [Cover and Thomas(2006)]). 

To conclude this section, let us comment on the proof of this theorem. 
Although the detailed proof of this theorem is provided in Sec. A, we would 
like to mention here some technical differences w.r.t the derivation made by 
Ahlswede in [Ahlswede(f 980)]. The imposed condition is totally different. Im- 
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posing resilience to single deviations to the source encoder requires to trans- 
mit without error the sequence of actions of the deviating player. Our proof 
is based on a sequence of coloring where the vertices are the symbols whereas 
Ahlswede [Ahlswede(1980)] use a coloring where the vertices are the sequences 
of symbols. To exploit the law of large numbers for the sequences of symbols, 
his proof requires an additional condition which is EPC. In our framework, 
this condition is removed and replaced with a condition over the admissible 
sequences of states (3) and by the feature that the random signal s depends 
on the state v only through the action a. Our result is applicable to the case of 
deterministic transition probability "1 whereas this special type of transition 
probabilities does not meet EPC. 

4 Equilibrium utilities of an encoder-assisted repeated games with 
signals 

The goal of this section is to characterize equilibrium utilities of an infinite 
repeated game with signals where an additional encoder establishes VPM. To 
this end, notations, definitions and results of the preceding section are used. 

4.1 Game formulation and main result 

We consider an encoder-assisted repeated game with signals. The stage or con- 
stituent game is given by the triplet (/C, {Ak)keK, (tife)feeK;), where Mfe S M is 
the utility function of player k E IC. The private monitoring structure is given 
by the conditional probability ~I(s|a) : A — > ^(5). The encoder C is assumed 
to perfectly monitor the past action profile a (z A and send a public message 
So S So to the players. 

A strategy for the encoder (by abuse of language we use the term strat- 
egy here even though in this paper the encoder has no utility in the game- 
theoretic sense) is a sequence of causal functions or mappings a = (cr*)t>i with 
\/t > 1, tr* : A^~^ X Sq~^ Sq ; t stands for the stage index and at is the 
profile played at stage t ; the set of strategies of the encoder will be denoted 
by ^. 

A behavior strategy for a player is a sequence of causal functions or map- 
pings (r^)t>i with Vi > 1, : {Ak x Sk x 5o)*~^ ^(^fc) ; the notation 
T — (ti, T2, ■■■tk) will stand for a profile of behavior strategies for the repeated 
game ; the set of behavior strategies will be denoted by T = OfeeJC 
At last, we will denote by 'Pa,T the probability distribution on the infinite 
sequences of actions, private and public signals ((a^)feg)ic, (Sfc°)fce)C, sg°) S 
A°° X S°° X induced by the pair of strategies (cr, t) e x T. At this 
point, one can define a uniform equilibrium of the encoder- assisted repeated 
game with signal. 

Definition 6 (Equilibrium points) A pair of strategies (cr, r) G U x T 
of the encoder C and the players /C is a uniform equilibrium of the encoder- 
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assisted repeated game with signals if : 

(i) For each player fc G /C, the expected utility, 



(12) 



has a limit when T +oo; 

(ii) Ve > 0, 3r > 0, VT >f,yke K., \1tI e Tk, such that, 



(13) 



The point U* = (J/j^, C/2 , C/^) G M.^ is a vector of equilibrium utilities if 
there exists a pair of strategies (tT*,T*) such that: 



Vfce/C, hm ^l[a\T*)^Ul. 



(14) 



The set of the equilibrium points of the encoder-assisted repeated game with 
signals will be denoted by NE^^^. 

Definition 7 (Individually rational points) The independent min-max 
level Vk of player fc G /C is defined by (15) and is also called punishment or 
defense level. The individually rational IR utilities are defined by (16) and 
correspond to the utilities that Pareto-dominate the min-max levels defined 
as 



Vk 



IR 



min max E-p, -p 



{xk)keK e 



Ukiak,a^k) 
Xk Vfc G /C 



fcG/C, (15) 
(16) 



Definition 8 (Information constraint set) The set TZ of mixed actions 
that satisfy the information constraint (9) is defined by : 



^= n ^(-^fc) 



max 



max i7(a_i,fc|sfe(ai),afe) + logj Xi 



< log2 \So\ 



(17) 



Theorem 2 (Folk theorem with VPM) The set of utilities convu{TZ)nlR 
is included in the set of uniform equilibrium utilities for the encoder-assisted 
repeated game with signals : 



coiwu{n) niR 



C 



(18) 



Moreover, for any utility vector in this set, VPM can be implemented by the 
encoder C and the players /C. 
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The proof is provided in Sec. B and is based on Theorem 1. The framework 
of AVS is exploited to characterize the communication possibiHties for the 
encoder. The first feature of the problem is that the coding scheme must be 
reliable even if one of the players deviates. The second main feature is that 
the coding scheme must also take into account the private signals received 
by the players. These two hypotheses allow us to determine the amount of 
additional information needed from the encoder in order to implement VPM. 
Interestingly, the proof of Theorem 2 relies on classical grim-trigger strategies 
but implemented in a blockwise manner and by exploiting VPM and strong 
typicality [Cover and Thomas(2006)] as a statistical test whose result indicates 
to every player whether to keep on following the main plan. 



4.2 Application to the repeated prisoner's dilemma 

We consider a prisoner's dilemma whose matrix form is given by Tab. 1. Let 
1^1 = 4 and |iSo| = 3. Note that the encoder cannot send the action profile 
profile directly to the players. The goal of this section is to describe the mixed 
strategies V* E ^(-4) that are compatible with the information constraints 
(9). If this constraint is satisfied, the encoder can compress the sequence of 





L 


R 


T 


(3, 3) 


(0, 4) 


B 


(4, 0) 


(1, 1) 



Table 1 The prisoner's dilemma in a matrix form. 



past actions, encode it into a sequence of public signals and the players can 
decode the sequence of past action with an error probability that goes to zero 
when the length of the sequences goes to infinity. Denote Ai = {T, B} and 
A2^{L,R}. 




Fig. 2 This figure illustrates the encoder-assisted monitoring structure for the repeated 
version of the prisoner's dilemma. The Theorem 2 provide a set of mixed strategies 
G ^{A) that allow the encoder C to establish VPM and the players /C to implement an 
equilibrium strategy. 
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To have a better understanding on how the results derived in Sec. 3 and 4.1 
are exploited here, we consider a particular monitoring structure "1 described 
by Fig. 3 with S g [0, 1]. This means that if the action a_fc G A-k was played, 





Fig. 3 Private monitoring structure ~1 that depends on the parameter <5 £ [0, 1] 



player k & {1,2} observes the right signal Sk & Sk with probability 1 — | and 
observes the wrong signal sj, G Sk with probability |. When S — 0, all the 
players have perfect monitoring. On the other hand, when i5 = 1, they cannot 
distinguish anything from the signal they observe (trivial monitoring). For this 
monitoring structure ~I(si, S2|ai, a.2) with 6 £ [0, 1], we want to determine the 
set conv u(TZ)nlR of utility profiles which are compatible with the information 
constraint (9). For the scenario under investigation, the information constraint 
(9) for (5 > rewrites as : 
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(a) 



: maxigK max feeK. -H'(a_i.fc |sfc (a^), aj.) + logj Xi 



H(a2|si) + log2 Xi> 
//(ai|s2) + log2 X2, 



< log2 I -So I 

< log2 I -So I 



(b) 

max 



E»2eA2, •P*(a2)n(si|a2)log2 (a2)-|(^i 1^2) + I0S2X1, 

Eaie^i, P*(ai)-I(s2|ai)log2 °i)1(s2|ai) + log2 X2 



< log2 l'5ol 



7'*(a2)(l- f)-lo 
+ P*(a^)f-log2 
+ T'*(a2)f -loga 
+ p*(a^)(l_ |).log2 



y*(''2)(l-|) + y*(a^)j 
•P*{<i2){l-f) 
'P*(a2)(l-|) + -P*{4)i' 

' y* (a2)|+y*(a2)(l-f ) ' 
T'*(«2)f 
-P*(''2)|+P*(4)(l-|) 



+ P*(a;)|-log2 
+ P*(ai)f-log2 
+ p*(a;)(l- |).log2 



P*(ai)(i-|)+-P*{«'i); 

p*(ii)|+y*(tt'i)(i-f) ' 

7:>*(ai)|+p*(ai)(l-|) 
P*(«i)(l-f) 



< log2 ISol - 1 



where (a) follows from the fact that a.2 and Si are independent of ai and 
then the entropy if(a2|si,ai) reduce to iJ(a2|si). Using the same argument, 
_ff(ai|s2,a2) reduce to 7J(ai|s2). (b) follow from the definition of the condi- 
tional entropy and (c) follow the fact that the chromatic number of the graphs 
Gi, G2 of both players are equals to xi = X2 = 2 as soon as (5 > 0. 

Setting S to 1, 0.35, 0.31 and 0.2, the above information constraint can 
be translated into Fig. 4. This figure represents the set of feasible average 
utility profiles which are both individually rational and compatible with the 
information constraint (9). Let us interpret these numerical results that depend 
on the precision parameter 6 G [0, 1] of the private monitoring "1. 

o Trivial monitoring: 6=1. The players have no information from their 
private signal, about the actions of their opponent. Theorem 2 show that for 
some utility vectors represented by the blue hatched region conv u(7^) n/i?, 
the encoder is able to send to both players, the sequences of past actions 
(with = 4) using an alphabet of 3 = |5o| symbols of public signals. 

o Noisy imperfect monitoring: 0.31 < S < 1. The private signals received 
by the players reveal a partial information about the past actions of the 
opponent. Only a portion of the utility region is compatible with the infor- 
mation constraint (9). The virtual perfect monitoring and the equilibrium 
condition are not always implementable. 
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O 
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X 




Nash Equilibrium utility 
Pareto-optimal utility 
Social optimum utility 
Min-max levels 
Deviation utilities 
conv u(A) 
u(TZ) n lit 

conv u{n) nIRC NE° 



0123T5 0123T5 
Utility of piayer 1 Utiiity of piayer 1 




Utility of player 1 Utility of player 1 



Fig. 4 The repeated version of the prisoner's dilemma is considered. The encoder-assisted 
monitoring structure of the game is described in Fig. 2 where the private monitoring 
~I of the players is described by Fig. 3 and depends on a parameter 5 G [0,1]- For 
(5 = {0.2,0.31,0.35, 1}, the blue region represents the set uiJV) n IR of utilities that sat- 
isfy the information constraint (9). The hatched blue region represents the convexe hull 
conv u{'R.) n IR of the utility that can be supported by a uniform equilibrium strategy (The- 
orem 2). In that case, the encoder can maintain virtually perfect monitoring even if one of 
the players deviates. Note that for 6 < 0.31, the precision of the private monitoring is suffi- 
cient to guarantee the same equilibrium utility region as for the Folk theorem with perfect 
monitoring. 



o Less noisy imperfect monitoring: < S < 0.31. The blue hatched utihty 
region convM(7?.) fl IR is equal to the utility region of the Folk theorem 
[Aumann(1981b)] with perfect monitoring convM(^) nIR. 

o Perfect monitoring: 6 = 0. The utility region coincides with the set of 
feasible and individually rational utilities. 

The proposed approach provides an equilibrium strategy (ct*, t*) G S xT that 
supports any utility profile in the blue hatched region U* = conv u (7?.) DIR of 
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Fig. 4, while ensuring that the encoder can maintain VPM even in the presence 
of single deviations. 



5 Conclusion 

This paper considers games where players have both a private signals they 
receive through the initial monitoring structure and a public signal which is 
sent by an encoder. The encoder is assumed to perfectly monitors the played 
actions and to send a public signal to the players. The purpose of the encoder 
is to establish virtual perfect monitoring. Technically, the encoder to be de- 
signed takes into account the side information at the receiver and possesses 
the property of resilience to single deviations. It is shown that, the internal 
information constraint imposes a restriction in terms of feasible utilities in or- 
der to establish virtual perfect monitoring and provide an equilibrium utility 
region (as proved in the case of infinitely repeated games). 

The proposed work can be extended in many respects. The targeted moni- 
toring structure can be chosen to be different (e.g., a 2— connected observation 
graph or a given public signal) . The proposed information constraint might be 
relaxed by assuming that the encoder sends complementary private signals. 
An interesting result would be to establish a converse, proving that the infor- 
mation constraint is necessary and sufficient. The i.i.d assumption might be 
relaxed with the aim to characterize equilibrium utilities which do not assume 
i.i.d action profiles. 



A Proof of Theorem 1 

We construct a coding scheme based on graph coloring and statistical tests. Two points have 
to be considered carefully. First, the side information g rnay provide some relevant 
information for player k even if another player i £ IC deviates. Second, the transition prob- 
ability 1 that generates the side information Sf^ G Sk is controlled by the actions af^ G Ak 
of each player k G K. 

Parameter. We choose a parameter e > such that: 



'2e : 



max 
ie/c 



max _f/{a_j,fc|sfc(ai),afc) -f logjXi 



+ 2e< log2 |5o|. 



(19) 



Encoding function /q. The encoder proceeds to the statistical test provided by (20) and 
constructs for a given sequence of actions a" = (a" , ■ • • , a,^ ) g , the following set : 



are mm 



E 



A^(a-fc|a'lJ 



(20) 



It chooses one component i £ K that minimizes (20). The symbols of the component i S /C 
will be encoded using the minimal coloring rj>i : A.i — > 'Pi (Def. 5) of the graph Qi defined 
by Def. ( i). Denote by Xi the chromatic number of the graph Qi and 

• encode the index of the chosen component i € /C using \K\ sequences Sq G <Sq of public 
signals ; 
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• encode the sequence of colors c" G ip" that corresponds to the sequence of actions 
a" S A." with at each stage Ci = (f>i{ai) using x" sequences Sq £ 5g of public signals. 

The other components a" ^ G -4"^ will be encoded depending on the transition probability 
"1 and on the sequence a" G A^f. For example, if the symbol a; £ has been used at 
a high enough frequency, the sequences of signals {s^{<ii))kelCy drawn from the transition 
probability "la; : A—i — > A(Sk), are sufficiently long to use a source coding scheme of 
the type Slepian and Wolf [Slepian and Wolf(1973)]. Otherwise, the information a"^ g -4" j 
should be encoded directly, without any compression. The encoder splits the sequences 
£ into sub-sequences (S;,"' )aie>li indexed by the symbols £ Ai where Ua^ = 
Af(ai|a"). The sub-sequence s^."* £ >Sj,"' has length ria^ £ N and is drawn i.i.d. from the 
joint probability P*^ (Xi "la; £ A{A-i X <S). The encoder evaluates the partition (A.i,A'^) of 
the symbols a; £ Ai defined as follows. For each £ > 0, there exists an fii such that the 
error probability of the Slepian and Wolf [Slepian and Wolf(1973)] coding is upper bounded 
by £ > : 

• Qi £ Ai, if N{ai\a") = ria^ < ni and then the sequence ct_°' £ -4_°' is encoded with 

sequences Sq £ 5g of public signals. 

• ai £ A'l if N(ai\a") = ria^ > ni and then the sequence £ Aj^^ is encoded using 
the "random binning technique" of Slepian and Wolf [Slepian and Wolf(1973)[. 

The random binning technique [Slepian and Wolf(1973)] consists in randomly assign the 
2n„,H(a_i) typical sequences a""' £ to one of the 2"«i (^''''fce'C (a^.a^, 

bin. Note that _ff(a_i fc|s;j(ai), a^,) = _ff(a_i|sfc(ai), aj;). Each bin B{sq) is indexed by a 
sequence sg- £ of public signals and contains 2""i J^{a_i;sfc(o,),afc)-e) ^ypj^.^! ge- 
quences a._^' £ . The encoder C observes a sequence or realized actions a_^' £ A_^' . 

If this sequence is typical, then it send to all the players K, the sequence of public signals 
Sq £ Sq corresponding to the bin containing the sequence £ B(sq). If it is not, the 

encoder C declares an error. 

Decoding function of player k £ K,. The decoding player receives the index i a K 
of the player chosen by the statistical test (20). Using the appropriate codebook, it de- 
codes separately the information regarding the component i aK and the other components 
3 £ K\{i}. 

• Knowing component i a K, chosen by the statistical test, the side information sj. £ 5^ 
and the color Ci £ <Pi, the decoding player k aK decodes a unique stage symbol £ Ai 
for component i G K. 

The decoding player k a K knows the entire sequence of actions a" £ A^ and it characterizes 
the partition Ai and Ai of the set of symbols Ai . 

• For the transition "la;, controlled by the symbol ai £ Ai, the sequence of symbols 
a_"' £ v4_°' is directly decoded. 

• For the transition "la^ , controlled by the symbol ai £ Ai'^ , the sequence of actions 
a_"' £ is decoded using Slepian and Wolf decoding [Slepian and Wolf(1973)[. The 
decoding player k a K find into the bin S(sq) corresponding to the sequence of public 
signals Sq £ Sq , a sequence £ A_^^ which is jointly typical with the sequence of 
side information s^icii) £ <S^ for the probability distribution V—i ®~\ai £ ■^(A—i X Sfe). 

Cardinality of iSq . Let rf2 > *°s l^i I l°g l^-a I ^ Then for all n > n2, the cardinality 

of the set of sequences 1 5o | " is greater than the number of sequences of the coding scheme 
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2n(H*+3e) 



+logXi + — ^-^^log|-4_i| + ^(max_f/(a_i fe|sfc(ai),afc) +e) 

n n '—t n keic 



< max 
~ ie/c 



< max 
~ ieic 



max _ff(a_i fc|sfc(ai),afc) + logXi 
a^eAi 



max _f/(a_i_fc|sfc(ai),ai.) + logxi 



log|_ft:| + r).i|A|log|A_i| 



■ 2e 



= R* + 3e 

< log2 l^ol- 



(21) 
(22) 



Error probability. Suppose that player i a K, chooses his sequence of actions a" £ 
with an arbitrary sequence of distribution. There are two possibilities. First, the statistical 
test (20) returns the deviating player i S AS. Second the statistical test returns another 
player J ^ «• 

• Suppose that the statistical test (20) returns the deviating player i G /C. In that case, 
the "random binning technique" of Slepian and Wolf [Slepian and Wolf(1973)] guaran- 
tees that for all S A-l the sequence of vectors of actions G -^—t perfectly 
reconstructed with large probability. Let us define the following events: 



-El = U heK, 
a.eA, 



{(a!?,s;:"-)^Ar{^-, x5fe)}. 



(23) 



There exists a player k £ K. for which the random sequences of actions and private 
Signals (a_^\s^ ) £ x * are not typical. 



„ I _ Tin ■ , rir, ■ , ria ■ , , Tin ■ Tl/v ■ , > fir, ■ , , , „ , , ^ I 

E2 = U fcEK, -j 3a_^ ' 7^ a_/ G B(so ' ), (a_. \ s^^ ' (aO-Sfc ' ) G A*"(^-» x 5^) ^. 



'-ii^Ai 



(24) 



There exists another sequence a_j"' in the bin i3(sQ°') corresponding to the sequence 
of public signals Sq"' G iSq that is jointly typical with the sequences of private signals 
Sj.°' (ni) and actions a^,"' of the player fc G /C. 

o From Lemma 6 of App. C, the error probability 'P(-Ei) is lower than e ■ K ■ > 

as soon as n is sufficiently large, 
o From Lemma 7 of App. C, the expected error probability ¥,x['P{E2)] of the random 

code fi G A{A{n)) is lower than e ■ K ■ \ Ai\ > as soon as n is sufficiently large and 

the condition (25) is satisfied. 



(25) 



Lemma 7 applies because the random sequence a_^"' is generated independently of 

the random sequences (s^."' (ni), a^,"' ). This ensures the existence of a code A G A{n) 
such that the error probability 'P\{E2) < 2£ is upper bounded. 
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• Suppose that the the statistical test returns another player j i G JC. This implies the 
following inequality: 



E 



< E 



N(a.i\aJi;) 



(26) 



For every player j G K,\{i}, the sequence a" £ A" is drawn i.i.d. from stage to stage 
with the distribution V* S A{Aj). From Lemma 6, these action sequences are typical 
with large probability as n goes to infinity. Then, the sequence of actions a" G A" is 
typical with large probability and then correctly encoded and decoded. There exists n 
sufficiently large such that the error probability Ve{X) < e is upper bounded. 

We therefore proved the existence a code A € ^(fi) such that the error probability of the 
code Vei^) < 2e ■ K ■ \ Ai\ is upper bounded. 



B Proof of Theorem 2 

We prove the following inclusion conv u(TZ) D IR C NE^^. First, we consider a utility 
vector U S ^(T^) n IR and provide a pair of strategies for the encoder and the players 
(cr*,T*) £ S X T that forms a uniform equilibrium (see Def. 6. The first condition (i) is 
satisfied when the asymptotic utility of the strategies (cr* , r* ) g E X T converges toward 
the utility U. The second condition (ii) is satisfied when no unilateral deviation t'i_ G Tk 
provides to player k £ K a gain larger than e > 0. 



B.l Construction of strategies (ct*,t*) E E xT 
B.1.1 Block coding scheme 

The T > stages of the repeated game are divided into B blocks of stages of length 
n, represented by Fig. .5. Denote B the set of blocks, b £ B the index of one block and 
i? = |S| € N the number of blocks. Denote (fe) S <S^ the sequence of signals received during 
the block b £ B. Fix the parameter e > and let us describe the strategies (a* , t*) G E xT 
that satisfy both conditions (27) and (28) for all T > f. 

\ll{a\T*)-Ul\ < e, VfcG/C, (27) 

7^(^^r*) + ^ >-yI{<^*,r;^,r*^), Vfc6/C, Vr^eTfe. (28) 
Suppose that the number of blocks B £N satisfies condition (29) : 

8 ■ maxag^ |"fe(a)| 



B.l. 2 Strategy of the encoder a* € S 

The coding strategy a* £ E consists in sending a sequence of public signals Sq £ Sq to 
each player so that they can reconstruct the sequence a" £ A" of past actions. In order 
to communicate, the encoder C and the players K implement a code A = {fo,{9k)keK) 
investigated in Sec. 3 and defined by : 

/o : -A" >■ 5J , , , 

A", yk£K. ^ ' 
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Condition U £ conv u{TZ) HIR implies that the probability distribution V* G OfegK: ^(-^fe) 
belong to the set Ti described by (17) and satisfies the condition (9) of the Theorem 1. 
This coding result implies that for all e > 0, there exists a parameter n g N and a code 
A S A{n) with A = (/o, {gk)keK) such that the error probability of the coding scheme is 
upper bounded by s. Denote by a"{k) g A" the sequence of actions obtained as output by 
the decoder fc £ AS. 



VeW = J2 .°iax^^n,(a" ^gfc(s^,s5,a^)). (31) 

keK 



The strategy of the encoder u* a E is built as follows. At the beginning of the block b G B 
with b > 2, the encoder C observes the sequence of actions a" (6 — 1) £ ji" over the block 
b — 1 £ B and choose the sequence of public signals Sg (b) over block b & B using the encoding 
function fo (30) provided by the code A that satisfies the condition (31). 



s^(b) = foi^a"ib-l)j ess. (32) 
Over the first block bi £ B, the encoder send an arbitrary sequence so{bi) £ Sq. 



B.1.3 Decoding scheme 



At the end of the block b a B with b > 3, player k implements the decoding function (30) 
provided by the code A that satisfies the condition (31). The player k a K, recalls his own 
actions a^{b — 1) £ and observes the sequences of private signals s^(fe — 1) £ 5^ and 
public signals sJJ (fe) £ <Sq sent by the encoder C. The player k e K, evaluates the sequence 
a"'(k, b — 1) of actions of block b — 1 (H B using the decoding function g^.. 

a"(fc,6-l) = gfe(^s^(fe-l),s5(fe),a^(fe-l)) £ (33) 

Condition (31) guarantees that at the beginning of block 6 + 1 £ B, each player k a K 
observes the sequence of actions a(6— 1) £ of the other players during the block b—1 a B 
with an error probability arbitrarily low. Over the two first blocks 6i , 62 £ B, no decoding 
strategy is implemented. 




Fig. 5 The strategies of the encoder C and the players K ((t*,t*) £ 17 x T are described 
at sections B.1.2 and B.1.7. The actions a"(6) over block b £ B are encoded over the next 
block b + 1 £ B into a sequence of public signals Sq (fe + 1). At the end of block 6 + 1 £ B, 
player k £ JC decode the sequence of actions a" (6) over block b a B from the sequences of 
signals Sq (fe + 1) and sj^'(fe). Player k e K, performs a statistical test in order to detect the 
possible unilateral deviations. The result of this statistical test determines the sequence of 
actions a^(b + 2) player k £ IC will play during the block b + 2 £ B. 
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B.1.4 Statistical test 

Each player k a K. performs a statistical test at the beginning of each block 6 + 1 G B. Define 
the event E^(fe + 1) using the set of typical sequences A*"{'P*) stated by the definition 9. 

When E^(fe + 1) = 1, player k £ K declare player i a K deviates from the prescribed strategy 
T* GTi, during block b — I £ B. 



B.1.5 Main plan 

The main plan consists in playing the same mixed action V* i.i.d. from stage to stage. 

•p*{a*) = -p*(ai)(g)...(g)-p*{aK) e n Vt > 1. (35) 



B.1.6 Punishment plan for player i £ /C 

The punishment plan V{i) = {'Pk(i))k^i G OfeT^i ^(-4^) of player i £ K consists of a vector 
of mixed actions of other players that minimize the utility of player i G /C. 



Gargminp_^gnfc^,4(-4fc) 



Vi G K. 



(36) 



The punishment plan for player i G /C by player A; G /C is denoted 'Pfc(i) G A{Ak) and is 
given by (36). If all the players k ^ i play the strategy 'P(i) = {'Pk{'i'))k^ij the player i G /C 
cannot obtain a utility greater than his min-max level fi G K characterized by (15). 



B.1.7 Equilibrium strategy t* = {T^)keK G T 

At the beginning of each block 6 > 3 G 8, the equilibrium strategy is described as follows: 

• Player k implements the decoding scheme (Sec. B.1.3) and reconstructs the actions 
a,_k{b — 2) G played by the other players j ^ k during block fe — 2 G 23. 

• Player k implements the statistical test E*^ defined section B.1.1, in order to detect 
possible unilateral deviations. 

• If the statistical test is negative, (V6' < b, Vi G IC, E^(fe) = 0), then player k a K. play 
the main plan G A{Ak) stated section B.1.5 during every stage of block b aB. 

• If the statistical test is positive, (3b' < 6, 3i G AT, E^(b) = l)i then player k a K 
play the punishment plan 'Pfe(i) G A{A.k) stated section B.1.6 corresponding to the 
player i £ IC until the end of the last block S G B. If several deviations are detected 
simultaneously E^(f)) = Ej(fe) = 1, then player k a K, punishes anyone of those players 
who is the smaller, according to a total order over /C, previously fixed. 

Over the first two blocks 61, 62 G B, players K play the main plan V* G A{A). The equilib- 
rium strategy t* = {T^)k£K G T is defined at each stage t > 1 as follows: 

*t(,u_jr* eAiAk) while E^(fe) = 0, Vi^fc, V6<L|,J 

' XVkii) & A{Ak) otherwise. ^ ' 
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B.2 Condition (i) of definition G: convergence of the utilities 

Let us fix a parameter e > and prove tliat there exists a f > 1 such that for all T > f, 
the utilities of the encoder and the players K (cr*,T*) a E X T defined in Sec. B.1.2 and 
B.1.7, are e-closed to the utility U £ conv u(TV) fl IR. Remark that e > and e > are two 
distinct parameters. Define the following event: 

„ f 1 if 3b&B,3i,keK such that (b) = 1, 

I otherwise. 

When E = 0, then no unilateral deviation is detected during the course of the game. 

Lemma 1 Suppose that the encoder C and the players JC implements the strategies 
{cr*,T*) a E y. T . Then for all e > 0, there exists a block length ni £ N, such that for 
all n > ni, the probability of event E = 1 is bounded as follows: 

V(E=l)<2eB-K^. (39) 



The result of Lemma 1 is useful for the proof of Lemma 2. 



Lemma 2 Suppose that the encoder C and the players K. implement the strategies 
{cr*,T*) a E X T . Then for all e > 0, there exists a block length ni € N, such that for 
all n> ni, the expected utility satisfies the following equation : 



< 4e ■ max|nfe(a)| ■ B ■ i^:^, ^k € K.. (40) 



For the parameter e > and a fixed number of block B £N, there exists a parameter e > 
and a block length ni S N such that 4£ ■ max^^j^ \ui^{a)\ ■ B ■ < e. From Lemma 2, the 
strategy defined over T = n ■ B stages induce, for each player k G fC, a utility that satisfies: 



Vfc e K. 



(41) 



By repeating the strategies cyclically, we prove that there exists a T > i^-^ such that for 

all T' >T and for all players k £ K., the expected T' stage utility 7-^ (cr*, r*) is e-closed of 
utility U S conv u{TZ) n IR. Strategies (cr* , r*) € E xT satisfy the condition (i) of definition 
6. 

Proof. [Lemma 1] Denote a*(b) the sequence of actions of player i £ K. observed by player 
k a K, over block b aB. For all e > 0, there exists ni G N such that for all n > ni , we have: 



vi^^^{b)iAf-{Vl) 



V{k^{k,b)^eJl(b) 



n,,fc6K <j E'j^(b - 1) = 0, . . . , E»j^(6i) = }■ ) < £, Vi, keK,VbeB, 

(42) 

n,,feeK {'E,l(b~l) =0,...,Bl(bi) = 0\) <e, Yi,keK, Vb e B. 

(43) 



Equations (42) and (43) come from the definition of strategies (cr*, r*) £ 17 X T. When no 
deviation is detected, the players implement the main plan (Sec. B.1..5) by playing i.i.d. the 
mixed action V* £ A(A). 

Equation (12) is a consequence of Lemma 6 for the typical sequences and (13) is a conse- 
quence of the coding result stated by Theorem 1 in Sec. 3.2 for an i.i.d. information source 
P* £ Zi(^). More precisely, this inequality is a consequence of (31) that guarantees at the 
beginning of block b G B, the players observe the sequence of actions played by the other 
players over the block fe — 2 £ B with probability 1 — e. 
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Let us evaluate the probability of event E = 1. 



P(E = 1) = U |E^(6) = Ijj (44) 

2 E ^( e|(!>) = l| n^ fc^K {E'fc(b - 1) = e|(6i)=o|) (45) 

H t{{<^?(I') i K'^(T*i)\ U {arC", i.) ^ 'H^^)\\ ",.kelC {K(b - 1) = 0, - . . ,E'fc(6i) = o}) 

(46) 



bee, 

i,keK 



J2 -pU^W <t n, 1^1(1. - 1) = 0, . . . ,E'fc(bi) = o|) 

^(af (fc. 6) ^ aj'(!,)| n, j^g;^ {E|(i> - 1) = 0, - . - . E|(i,i ) = o}) 



bee, 

i,k€K 



(47) 



(48) 



Equality (44) comes from the definition of error event provided by (38). 
Inequality (45) comes from the property ViA U B) = 'P(A) + P(B|A=) ■ P(A=). 
Inequalities (46) and (47) comes from the inequality of Boole. 
Inequality (48) comes from the inequalities (42) and (43). 

As a conclusion, for all e > 0, there exists a ni G N such that for all n > ni, the con- 
dition (39) is satisfied. □ 

Proof. [Lemma 2] When the event E = occurs, then all the statistical tests of players JC 
at the beginning of each block fe G B are negative (i.e. E^(6) = 0, Vb £ B, \/i,k S IC). In 
this case, the strategy t* £ T indicate that the sequence of actions are generated with the 
same mixed strategy V* S OfceK '^(^fc) fro™ stage to stage. From the proof of Lemma 1, 
for all £ > 0, there exists rii g N such that for all n > rii , the sequences of block actions are 
typical with large probability. More precisely, because T = n ■ B > n > n\, the sequences of 
actions are typical with large probability. 



■P a" G A*"CP*) 



E = I < £, 
E = I < e. 



(49) 
(50) 



Suppose that n > ni defined from Lemma 1. Recall the definition of the typical sequences 
and some implications thereof: 



G Af{V*) 
N{a\a^) 



E 
E 



T 

N{a\a^) 



< e 



T 



Uk(a) - V*{a)uk{a) 



< e ■ max [«j,(a)| 



^' ^ »fc(a)- ^P*(a)»fc(a) 



a£A 
T 



aGA 

«fe(a) 



< e ■ max |«fe(a) 

a£A 



< e ■ max |tH;(a)|. 

a£A 



(51) 
(52) 

(53) 

(54) 

(55) 



Inequality (52) comes from the definition 9 of typical sequences. 
Inequalities (53) come from the homogeneity property. 
Equation (54) comes from the triangle inequality. 

Equation (55) is a reformulation of (54) and allows us to obtain the following equations : 
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7j(<T*,r*)-Ep* 



1 ^ 



(56) 
(57) 



T T 
aTgA*T(-p*) t=l aT^^*T(p*) t=l 



"ft (a) 



(58) 



aTgAjT(p*) t = l 



«fc(a) 



< 7-'ct*,t* (c"^) ■ e ■ max |-Ufe(a)| + >^ 'Po-* t* (o;^) ■ £ ■ max li^feCfj)! 

< £ ■ max I Mfc (a) | + "P I a'^ ^ A*"^ I ' ^ ' niE'x | u^; (a) 



< max|Mfc(a)| ■ £ + ^(a^ ^ A*^|E = 0) ■ P(E = 0) + P(a^ ^ ^e'^|E = 1) ■ P(E = 1) 



< max |«fc(a)| ■ e + P(a-' ^ A*-* |E = 0) + -P(E = 1) 



< max |Mfc(a)| ■ 2e + P(E = 1) 



< max \uk(a)\ ■ [2£ + 2e ■ B ■ K' 

a^A \ 

< 4e ■ max \ uic(a)\ ■ B ■ K^. 

aeA 



(59) 
(60) 

(61) 

(62) 

(63) 

(64) 

(65) 
(66) 



Equalities (57) and (58) come from the definition of the expected T-stages utihty, see (12). 
Inequality (59) comes from the triangle inequality. 
Inequality (60) comes from (55). 

Inequalities (61), (62) and (63) are reformulation of (60). 
Inequality (64) comes from (50) because by assumption T > n > rii. 
Inequality (65) comes from Lemma 1 because by assumption n > rai. 

Inequality (66) is a reformulation with B ■ > 1 which concludes the proof of Lemma 2. 

As a conclusion, for all e > 0, there exists ni € N such that for all n > ni, the condi- 
tion (71) is satisfied. □ 



B.3 Condition (ii) of definition 6 



In order to prove that the strategies {(j*,t*) a E x T support a uniform equilibrium, we 
suppose that player k G K. implement a deviating strategy and we prove that the 

deviation gain is less than e > 0. 
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B.3.1 First case: non-typical deviations 

Let 6 — 1 £ H the first action block over which the sequence of actions aj^{b — 1) ^ A*"{V^) 
of player fc £ is not typical. Denote ii (b) and t„ {b) the indexes of the first and the last 
stage of block b G B. 



7fc (o-*,-r^,T*fc) 



1 ^ 

-^nfe(a*) 



Evaluate, for player k G K, the utilities associated with the strategies and t^. 

T 

T ■ 

r T 

t=l 

t = l t=ti{b-l) t = ti{b+l) 



T 
1 , 



(67) 
(68) 

(69) 

Approximation of tlie utility associated with the strategies (cr* , r*) S ExT between 
blocks 6 + 1 G S and S G B. 

Lemma 3 Suppose that the encoder C and the players K, follow the strategies ((t*,t*) G 
S X T. Then for all e > and for all number of blocks B G N, there exists a block length 
rii such that for all n> ni, the following inequality is satisfied for all 1 < b < B — 1: 



J2 

i=ti(6+l) 



>n{B-b)-{U^--). 



(70) 



Proof. [Lemma 3] Let us fix the parameter e > and suppose that the encoder C and the 
players K follows the strategies (cr*, t*) £ S xT. This proof is built on Lemma 2 that prove 
for all £ > 0, there exists a block length ni G N, such that for all n > ni, the expected 
utility satisfies the following equation : 



7^(f7*,T*)-Ep. 



Mfc(afe,a_fc) 



< 4e ■ max |Mfe(a)| ■ B ■ K , 



V/c G K. 



(71) 



For a fixed number of blocks B G N, we choose e > such that e > 4£-maxagyi \uf^(a) \ -B-K^). 
Using the same reasoning as in Lemma 2, we prove that for all £ > 0, there exists a block 
length ni G N, such that for all n>n\, the expected utility satisfies the following equation 
for all fc G /C: 



1 



n(B — b) '—^ , 

^ ' t=ti{()+l) 



«fe(afe,a_fc) 



< 4£ ■ max |ufe(a)| ■ B ■ K , 
(72) 



^(7 .tZ ,T 



^O"^ ,T, ,T_ 



-t=ti{i)+i) 

T 

t=ti(b+i) 



- n(B - b) ■ Ul 



< n(B - 6) ■ 4£ ■ max |ufc(a)| ■ B ■ , (73) 



> n{B - b) ■ {Ul - 4e ■ max|ufc(a)| ■ B ■ K^). 

a^A 



(74) 



For a fixed block number _B G N, we choose the parameter ^ > 4£ ■ max;,g_4 \uk{a)\ ■ B ■ 
and the block length rii G N that satisfies (12) and (13) of Lemma 1. We obtain the inequality 
(70) of Lemma 3. □ 

Upper bound on the deviation gain obtained by player fc G /C using the deviation 

strategy r^. G 7fc, over blocks 6 — 1 G B and b a B, 
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Lemma 4 For every block b a B, the following inequality is satisfied: 



t„(ft) 

t = ti(b-l) 



< 2n ■ max |«fc(a)| 



Vfc e /C. 



Proof. [Lemma 1] The proof is direct. 



(T ,T, ,T 



t„(f,) 

t=ti{6-l) 



< E. 



cr ,r, ,r 



(in(ti) - ii(fe - 1) + 1) • max |Mfc(a) 



< 2n ■ max |«(;(a)|, 



Vfc G /C. 



(75) 

(76) 
(77) 



The deviation utiUty satisfies (75). □ 

Upper bound on the utility of player fe G /C during the punishment phase 

induced by the prescribed strategies [a* ,t*) a S xT. 

Lemma 5 Suppose that b — I G B is the first block on which the sequence of actions 
ak{b — 1) ^ ^J"(^fc) of player k & K is not typical. Suppose the block length satisfies 
n > ni, defined by (4-2) and (43). The prescribed strategies {a* ,t*) a I] x T , induce the 
following equation: 



t=ti{6+l) 



< £ ■ max |«fe(a)[ + n[B — b) ■ v^. 



(78) 



Lemma 5 is a consequence of the coding result stated by Theorem 1. 

Proof. We suppose that the actions afe(fe — 1) ^ ^t^CP^) of player k a IC over block 6 — 1 G S 
are not typical and the length of the block satisfies n > ni, defined by (42) and (43). Define 
the error event E^j related with the statistical test E^(fe + 1) defined by (34). E^j = means 
that the statistical test of all the players j ^ k a K, reveals that player fe G AT deviates during 
block 6 - 1 G B. 



if 

1 otherwise 



VjVfc, E*(fe + l) = l, 



(79) 



The coding result stated by Theorem 1 allow us to bound the probability of event E^ = 1 
knowing that afc(b — 1) ^ ^*"(^fc)- The following inequalities are valid for any deviation 
strategy G 7fe of player k a K. 



< V, 



.,,,,,*^(Ed = lafc(fe-l)^ArCPfc*)) 



CT" ,T. ,T- 



CT" ,T. ,T- 



3j^fcG/C, E^(6+l) = 



Uj^fc <^E|(6+1) = 



afc(b-l)^A*"(n)) 
}|«fc(&-l)^Ar(n*)) 



CT" ,T' ,T" 



U,^fe^ai(6-l)GArCPfc*) 



a^{b-l)iAV\Vl) 



< Ve{X) 

< V max max V(a" ^ a" (k)\vi) 

< e, Vt(, G Tfc. 



(80) 
(81) 
(82) 

(83) 

(84) 
(85) 

(86) 



Inequalities 81 and 82 come from the definition E^ and Boole's inequality. 

Inequality 83 comes from the definition of the statistical test 31 presented section B.1.4. 

Inequality 84 come from the strategy of the encoder cr* G X' and the decoding scheme of 
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player j £ /C, described in sections B.1.2 and B.1.3, based on the coding scheme A € ^(™) 
with n> ni during block b — 1 S B. 

Inequality 85 comes from the definition of the error probability of the code A G A{n). 
Inequality 86 comes from the coding result given by the Theorem 1 for an arbitrarily varying 
information source (AVS). When a player deviates, the actions are generated with an in- 
certain probability distribution satisfying the hypothesis (3) and (4) of the definition 1. We 
suppose the length of a block satisfies n> ni defined by (42) and (43). Therefore, from the 
Theorem 1, there exists a code A G A{n) for which the error probability Vei^) is bounded 
by £ > 0, for every unilateral deviation rj^ £ Tk o{ player k £ K,. This inequality allow us to 
obtain an upper bound on the utility of player k £ K. during the punishment phase stated 
by the Lemma 5. 



J2 «fc(a*) 

t=ti(6+l) 



CT* ,T,. ,T_ 



*ti{6+l)'-^d 



Erf = 1 



afe(b-l)^^r(n) 



(87) 



J2 

t=ti{i)+i) 



E ■^.^r^r*,(<{i,+l).Ed = Oafc(b-l)^Ar(n)) ■ [ J2 



t=tl{i)+l) 



Erf = 1 



akib-l) At'^iVt)) ■max|«fe(a)| + 



n{B -b) -Vk 



< e ■ max In/; (a) I + 



n(B -b) -vt 



(88) 
(89) 

(90) 



Inequality 88 comes from the definition of the expectation 87 knowing that the sequence of 
actions 0^.(6 — 1) ^ A*"{'P^) is not typical over the block b — I £ B. 

Inequality 89 comes from the punishment plan stated by the strategy r*j, when all the 
players j ^ k detect the deviation of player k £ K (Erf = 1). For each stage t\{b+l) < t <T, 
the utility of player fc S /C is less than the min-max level Uk{a,^) < 'L'fe. 
Inequality 90 comes from (83). □ 
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Equilibrium condition. Hypothesis (29) over the number of blocks _B g R and the 
results of Lemma 3, 4 and 5 allow us to obtain the following inequalities; 

=^ -4^+-B» > 4 . m^ax |ufc(a)| (92) 

=^ (B - b) ■ U* - ^ + Be > 4 • m^ax |ufc(o)| + (B - 6) • vfc (93) 

=^ (B - b + 2) - U* - iB-b)e ^ > 2 . _max |ufc(a)| + , + e ■ m|.x |ufc(a)| + (B - 6) . i^fc (94) 

=^ n(B - 6 + 2) ■ [J* - 1 iB~b+2)e ^ > 2„ ■ max | (a) | + ■ max | (a) | + „(S - 6) ■ (95) 



r n r i 

L n ' J • k- -k Lt^t^(,,_i) J 

(97) 

• -ki i\ ) J ■ fc. -fo L J fc- -ki^^^^^^^^ J 

(98) 

r -I pt„(')-2) t„(b) T -, 

*= -*= ^ -' ■ fc- -fo L t = ti(b-l) t = ti(6 + l) ■' 

(99) 

^ + e >-i'E{<'*.^'k,^*-.k)^ (100) 

Inequality 91 comes from the hypothesis (29) over the number of blocks _B G M. 
Inequality 92 comes from the reformulation of inequality (91). 

Inequality 93 comes from the hypothesis of individual rationality > sj^ stated by the 
definition 7. 

Inequalities 91 and 95 come from the reformulation of inequality (93) with e < 
max„g^ |«t;(a)| and e < 1. 

Inequality 96 comes from Lemma 3 which provides an approximation of the utility associ- 
ated with the strategies (cr*,T*) £ E xT. 

Inequality 97 comes from the Lemma 1 which provides an upper bound on the deviation 
gain obtained by player k a K while playing the strategy S 7^. 

Inequality 98 comes from Lemma 5 which is a consequence of the coding result stated by 
Theorem 1. This result provides an upper bound over the utility of player k G K during the 
punishment phase. 

Inequality 99 comes from the fact that 6 — 1 G S is the first block on which the action 
sequence a^ib — 1) ^ of player fc G /C is not typical. 

Inequality 100 comes from the definition of the utilities of the T-stages repeated game stated 
equation (12). 

We prove the strategies {a* ,t*) € 17 X T satisfy the equilibrium condition stated by point 
(ii) of the definition 6. 



B.3.2 Second case: typical deviations 

Let us fix e > and suppose player k a K uses a deviating strategy g 71- such that the 
action sequence of player k K over each block b a B belong to the set of typical sequences 
o-k{b) G AJ"(P^). From the proof of Lemma 2, for all e > 0, there exists ni S N such that 
for all n > n\ we have the following implication (102). Taking s ■ max^g^ \uf^{a)\ < e, we 
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obtain the implication (103). 

e AfiV-") (101) 

< £ • max|ufc(a)[. (102) 

^ "/Ii'^\riT*,) <^lia*,T*,T*_,) + e. (103) 

The utility provided by the strategies (cr*,r*) G 17 X T satisfies the equilibrium condition 
stated by point (ii) in definition 6. 



Mfc(a) 



B.4 Conclusion 

We showed that by setting the parameter e > 0, we obtain a condition on the number of 
blocks _B G N given by (91), then a condition over the coding parameter £ > given by (71) 
and then a condition over the block length n> ni given by (42) and (43). 
For all U £ u(TZ) nIR, these parameters allow us to construct a pair of strategies {(J* , t*) S 
E X T over T = n ■ B stages that satisfies the conditions (27) and (28). By repeating 
these strategies cyclically, we show that any vector of utility U £ convu(7?.) n IR satisfy 
both conditions (i) and (ii) (i.e. definition 6) of the uniform equilibrium. The utility U S 
convti(7?,) n IR is a uniform equilibrium utility for the infinite repeated game _r°°. 



C Review of typical sequences 



The achievability part of the coding theorems are based on the properties of the typi- 
cal sequences. This section provides some recall on this notions that can also be found in 
[Cover and Thomas(2006)] and [Csiszar and K6rner(1981)]. 

Definition 9 (Typical sequences [Csiszar and K6rner(1981)]) Let Q £ A{X X y) 

a probability distribution over X X y. The typical sequences and the conditional typical 
sequences are defined as follows: 



Ar(x) 



x" £ X"; J2 



N{x\x") 



Q{x) 



Af(x,s/|a;",j/") 



< £, Vx £ X, Q{x) = ^ N{x\x") = 



Q{^, y) 



V(x, y)eXxy, Q(x, y) = 0^ N{x, y\x",y") 



(104) 



Lemma 6 (Properties of the typical sequences [Csiszar and K6rner(1981)]) Let 

Q £ A{X X y) a probability distribution, Q®" a n-product of the probability distribution 
and J/" £ A^*{y) a typical sequence. For all e > 0, there exists n £ N such that: 



(105) 
(106) 



1 = Q»"^a;" £ A^* {X) j , 

1 = £ A^*(X\y") 

2n(H(x)—c£) ^ |A"*(A')| < 2^^^^^^ + ^^^ 



(107) 
(108) 



where c = log 



max, 6^ 



is constant. 
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This result states that an i.i.d. sequence of symbols is almost surely typical when n goes to 
+00 and it provides an upper and a lower bound on the size of the sets of typical sequences. 

Lemma 7 (Packing Lemma [El Gamal and van der Meulen(1981)]) Let Q G 

A{U X V) a correlated probability distribution, Qjj (resp. Qv) the marginal induced by 
Q over U (resp. V), Q®" and Q®" the n-product of marginal probability. Let Rj and Rj 
real numbers, 

• ('^Digji 2"^i} £ a family of sequences drawn with Q®", 

• (^7)jg{i 2"'^ J} ^ a family of sequences drawn with Q®", 
If the condition (109) is satisfied. 



Where Iq{u;v) denote the mutual information [Cover and Thomas(2006)] with respect to 
the probability distribution Q. 
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